Análisis Genómicos y Transcriptómicos con plataforma NGS

Tabla de Contenidos


Instalación de paquetes desde CRAN

install.packages("R.utils")       # <- uncompress .gz files
install.packages("BiocManager")   # <- Bioconductor manager

Instalación de Bioconductor

BiocManager::install()

Verificar la version de Bioconductor que estamos usando y si los paquetes están actualizados:

BiocManager::version()
BiocManager::valid()

 

Los módulos de trabajo son lo siguientes:

  • análisis de secuencias cortas (shortRead)
  • mapeo de secuencias cortas a un genoma (alignment)
  • análisis de expresión diferencial (DEA)

 

ShortRead

  *Analisis de genomas y transcriptomas con plataforma NGS
  *Master Biotecnologia
  *2025
  
  Jose V. Die, Dept. Genetics, UCO
  jose.die@uco.es

Instalación del paquete que usaremos en este módulo.

BiocManager::install("ShortRead")       # <- manipulate FASTQ files

Una vez instalados los paquetes necesarios para este módulo, cargamos las librerías.

# Dependencies
library(R.utils)
library(ShortRead)

Descarga ficheros

Estructura del proyecto

dir.create("sreads")          # create a directory/folder in the indicated path
dir.create("sreads/fastq")    # create a directory/folder in the indicated path
dir.create("sreads/genome")   # create a directory/folder in the indicated path
dir.create("sreads/results")  # create a directory/folder in the indicated path
dir("sreads")                 # see files and folders in the indicated path)

Descarga ficheros secuencia genómica

download.file("https://www.dropbox.com/s/4ft480eky7kghzw/Saccharomyces_cerevisiae_genome.fa.gz?dl=1", 
              destfile = "sreads/genome/Saccharomyces_cerevisiae_genome.fa.gz")

Descarga de secuencias cortas

download.file(url = "https://www.dropbox.com/scl/fi/662wqxas24h0mdzadfffh/ENASRR23944525.fastq?rlkey=a3u8htlieoyeh4o9dgplax831&dl=1", 
destfile = "sreads/fastq/SRR23944525.fastq.gz")

Descompresión de las secuencias fasta.gz y fastq.gz

# Uncompress fasta.gz & fastq.gz files
R.utils::gunzip("sreads/genome/Saccharomyces_cerevisiae_genome.fa.gz", remove = FALSE)
R.utils::gunzip("sreads/fastq/SRR23944525.fastq.gz", remove = FALSE)

En este módulo queremos tener un visión general de las posibilidades que ofrece el paquete ShortReads para el tratamiento de secuencias cortas (ej. lecturas generadas por un secuenciador tipo Illumina). Estas lecturas se presentan en un fichero con formato fastq pero el mismo paquete permite también el tratamiento de secuencias almacenadas en un fichero fasta

formato fasta

# # Read fasta : Saccharomyces genome
fa_sample <- readFasta(dirPath = "sreads/genome/", pattern = "genome")
fa_sample <- readFasta(dirPath = "sreads/genome/", pattern = ".fa$")
# # Print sample
fa_sample
## class: ShortRead
## length: 17 reads; width: 85779..1531933 cycles

Podemos acceder a las funciones (methods) disponibles para un objeto de clase ShortRead con la siguiente llamada:

# # Methods accessors 
methods(class = "ShortRead")
##  [1] [                 alphabetByCycle   append            clean            
##  [5] detail            dustyScore        id                length           
##  [9] narrow            pairwiseAlignment show              srdistance       
## [13] srduplicated      sread             srorder           srrank           
## [17] srsort            tables            trimEnds          trimLRPatterns   
## [21] width             writeFasta       
## see '?methods' for accessing help and source code

Veamos el resultado de aplicar alguno de los métodos disponibles :

length(fa_sample)
## [1] 34
width(fa_sample)
##  [1]  230218  813184  316620 1531933  576874  270161 1090940  562643  439888
## [10]  745751  666816 1078177  924431  784333 1091291  948066   85779  230218
## [19]  813184  316620 1531933  576874  270161 1090940  562643  439888  745751
## [28]  666816 1078177  924431  784333 1091291  948066   85779
id(fa_sample)
## BStringSet object of length 34:
##      width seq
##  [1]    52 I dna:chromosome chromosome:R64-1-1:I:1:230218:1 REF
##  [2]    54 II dna:chromosome chromosome:R64-1-1:II:1:813184:1 REF
##  [3]    56 III dna:chromosome chromosome:R64-1-1:III:1:316620:1 REF
##  [4]    55 IV dna:chromosome chromosome:R64-1-1:IV:1:1531933:1 REF
##  [5]    52 V dna:chromosome chromosome:R64-1-1:V:1:576874:1 REF
##  ...   ... ...
## [30]    58 XIII dna:chromosome chromosome:R64-1-1:XIII:1:924431:1 REF
## [31]    56 XIV dna:chromosome chromosome:R64-1-1:XIV:1:784333:1 REF
## [32]    55 XV dna:chromosome chromosome:R64-1-1:XV:1:1091291:1 REF
## [33]    56 XVI dna:chromosome chromosome:R64-1-1:XVI:1:948066:1 REF
## [34]    57 Mito dna:chromosome chromosome:R64-1-1:Mito:1:85779:1 REF
sread(fa_sample)
## DNAStringSet object of length 34:
##        width seq
##  [1]  230218 CCACACCACACCCACACACCCACACACCACAC...TGTGGGTGTGGTGTGGGTGTGGTGTGTGTGGG
##  [2]  813184 AAATAGCCCTCATGTACGTCTCCTCCAAGCCC...GTGGTGTGTGTGGGTGTGGTGTGTGGGTGTGT
##  [3]  316620 CCCACACACCACACCCACACCACACCCACACA...GGTGTGTGGGTGTGGTGGGTGTGGTGTGTGTG
##  [4] 1531933 ACACCACACCCACACCACACCCACACACACCA...AACATAAAATAAAGGTAGTAAGTAGCTTTTGG
##  [5]  576874 CGTCTCCTCCAAGCCCTGTTGTCTCTTACCCG...AAGAACAGGGTTTCATTTTCATTTTTTTTTTT
##  ...     ... ...
## [30]  924431 CCACACACACACCACACCCACACCACACCCAC...GGGTGTGGTGTGGGTGTGGTGTGTGTGTGGGG
## [31]  784333 CCGGCTTTCTGACCGAAATTAAAAAAAAAAAA...GGTGTGTGGGTGTGTGTGGGTGTGGTGTGGGT
## [32] 1091291 ACACCACACCCACACCACACCCACACCCACAC...TAGATGTGAGAGAGTGTGTGGGTGTGGTGTGT
## [33]  948066 AAATAGCCCTCATGTACGTCTCCTCCAAGCCC...TTTCATTTTTTTTTTTTAATTTCGGTCAGAAA
## [34]   85779 TTCATAATTAATTTTTTATATATATATTATAT...GAAATATGCTTAATTATAATATAATATCCATA

Podemos extraer la secuencia de nucleótidos de un objeto de clase DNAStringSet :

# # Extract sequence from DNAStringSet object (100 first letters)
substr(toString(sread(fa_sample)[1]), 1, 100)
## [1] "CCACACCACACCCACACACCCACACACCACACCACACACCACACCACACCCACACACACACATCCTAACACTACCCTAACACAGCCCTAATCTAACCCTG"

Otra forma de extraer la secuencia :

as.character(sread(fa_sample)[1])

El objeto de clase ShortRead puede escribirse en formato fasta :

# # Write a ShortRead object 
writeFasta(object = fa_sample, file = "sreads/results/Scerevisiae_genome.fasta")

formato fastq

# # Read fastq: Beauveria bassiana reads
#fq_sample <- readFastq(dirPath = "fastq", pattern = "fastq")
fq_sample <- readFastq(dirPath = "sreads/fastq/SRR23944525.fastq.gz")
# # Print sample
fq_sample
## class: ShortReadQ
## length: 603239 reads; width: 35..76 cycles
# counts the nubmer of records, nucleotides, and base-level quality scores
countFastq(dirPath = "sreads/fastq/SRR23944525.fastq")
##                   records nucleotides   scores
## SRR23944525.fastq  603239    45656825 45656825

Podemos acceder a las funciones (methods) disponibles para un objeto de clase ShortReadQ con la siguiente llamada:

# # Methods accessors 
methods(class = "ShortReadQ")
##  [1] [                 [<-               alphabetByCycle   alphabetScore    
##  [5] append            clean             coerce            detail           
##  [9] dustyScore        id                length            narrow           
## [13] pairwiseAlignment qa                reverse           reverseComplement
## [17] show              srdistance        srduplicated      sread            
## [21] srorder           srrank            srsort            tables           
## [25] trimEnds          trimLRPatterns    trimTails         trimTailw        
## [29] width             writeFasta        writeFastq       
## see '?methods' for accessing help and source code

Veamos el resultado de aplicar alguno de los métodos disponibles :

sread(fq_sample)[1:20]
## DNAStringSet object of length 20:
##      width seq
##  [1]    75 GGTGCAAGGGGGTCGGAGTTGATGGCGGAGTGG...CATCGGCAGGCTGCGTTGGCAGATGCCTTCTTT
##  [2]    76 CAATACTCGGTCAAAAAAAAAAAAAAAAAAGGA...GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
##  [3]    76 CAAAGTGCTTGGGCTCCAGGGGGAGTATGGTCG...ACTTAAAAAAAAAAAAAAGATCGGAAGAGGGGG
##  [4]    75 CTGGCTACGAGAGCATCAAGCGCGTGGCTTTGA...TCAAGCCGCGCGTGCGCTGGTGACGCGACGTCG
##  [5]    74 GGTGAGGCCGGCTATCTGAGTTTCATCGTGCAA...CCACAATTGACGTTGCATTCAACGTCTGTGCAT
##  ...   ... ...
## [16]    76 AGTGAGGTGTTGGCGGCGTTGCAGATGAGCATA...GCCAAGTCGAGTAACGAGAGATGTGGCGAGTCA
## [17]    75 ACATGCTTCGAGGTCCAAGCTGTGAGGTTTACT...AGGGATACCCGGATGGCGTTGATGAATTATGGG
## [18]    74 CGGGCCCGAGATTCCAGTGTTTGGCATTGACAT...ACGCCTACTACTTGCAGTATCTCAACGGCAAGG
## [19]    76 TTATTAAATAAAAAAAAAACGTATACTACCCTT...TATAACTTTACAGCCAATTTCACAATAATATTA
## [20]    76 GAGGAGTGGCGCAAGTAGCATCGGCAGGCTGCG...CCTTCTTTCTATGCAATCTGATATAGTATGAAA
width(fq_sample)[1:20]
##  [1] 75 76 76 75 74 76 76 76 74 75 76 76 76 76 76 76 75 74 76 76
sum(width(fq_sample)) / 1e6 
## [1] 45.65682
# comparar con slide "entrez" : 45.7M
# https://www.ncbi.nlm.nih.gov/sra/SRR23944525
hist(width(fq_sample), main = "Reads length")
abline(v = 75, col = "red", lwd = 2)

Podemos filtrar las lecturas que tengan un tamaño mínimo :

fqsample_size <- fq_sample[width(fq_sample) >= 75]
hist(width(fqsample_size), main = "Reads_filtered length")

Número de lecturas originales y lecturas filtradas por tamaño:

length(fq_sample) / 1e3
## [1] 603.239
length(fqsample_size) / 1e3
## [1] 567.342

El objeto de clase ShortReadQ puede escribirse en formato fastq:

# # Write a ShortRead object 
writeFasta(object = fqsample_size, file = "sreads/results/reads75.fastq")

 

Quality assessment

# get quality scores per base (for all sequences in the fastq files!) as a matrix
qPerBase = as(quality(fq_sample), "matrix")

Ahora podemos repasar la calidad de una o de varias secuencias. Por ejemplo, los primeros 20 nucleótidos de las 3 primeras lecturas que aparecen en el fichero .fastq:

(Remember : a score of 30 is considered a good quality as it means the accuracy of base call is 99.9%)

# Check quality of one/some sequence(s)
qPerBase[1:3, 1:20] 
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,]   32   32   32   32   32   36   36   36   36    36    36    36    36    36
## [2,]   14   32   32   14   32   14   14   32   14    36    36    36    36    32
## [3,]   21   32   32   32   21   36   14   14   36    36    21    21    21    32
##      [,15] [,16] [,17] [,18] [,19] [,20]
## [1,]    36    36    36    36    36    36
## [2,]    14    14    36    36    14    36
## [3,]    14    36    36    36    14    14

Podemos filtrar las lecturas que tengan una calidad mínima :

  • por cada secuencia : sumar bases < calidad mínima
  • extraer las secuencias donde el paso anterior = 0
# get number of bases per read that have quality score below 30
qcount = rowSums(qPerBase < 30, na.rm = TRUE) 

# Number of reads where all Phred scores > 30
# qcount == 0 , meaning all quality scores per base in the read >= 30
fqsample_qfiltered = fq_sample[qcount == 0]

Podemos filtrar las lecturas que tengan un tamaño mínimo :

fqsample_size <- fq_sample[width(fq_sample) >= 75]
hist(width(fqsample_size), main = "Reads_filtered length", xlab = "width (bp)")

Número de lecturas originales y lecturas filtradas por tamaño:

## N. total de lecturas originales: 603.239
## N. total de lecturas filtradas por tamaño: 567.342

El objeto de clase ShortReadQ puede escribirse en formato fastq:

# # Write a ShortRead object 
writeFasta(object = fqsample150, file = "sreads/results/reads150.fastq")

Quality assessment

# get quality scores per base (for all sequences in the fastq files!) as a matrix
qPerBase = as(quality(fq_sample), "matrix")

Ahora podemos repasar la calidad de una o de varias secuencias. Por ejemplo, los primeros 20 nucleótidos de las 3 primeras lecturas que aparecen en el fichero .fastq:

(Remember : a score of 30 is considered a good quality as it means the accuracy of base call is 99.9%)

# Check quality of one/some sequence(s)
qPerBase[1:3, 1:20] 
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
## [1,]   32   32   32   32   32   36   36   36   36    36    36    36    36    36
## [2,]   14   32   32   14   32   14   14   32   14    36    36    36    36    32
## [3,]   21   32   32   32   21   36   14   14   36    36    21    21    21    32
##      [,15] [,16] [,17] [,18] [,19] [,20]
## [1,]    36    36    36    36    36    36
## [2,]    14    14    36    36    14    36
## [3,]    14    36    36    36    14    14

Podemos filtrar las lecturas que tengan una calidad mínima :

  • por cada secuencia : sumar bases < calidad mínima
  • extraer las secuencias donde el paso anterior = 0
# get number of bases per read that have quality score below 30
qcount = rowSums(qPerBase < 30, na.rm = TRUE) 

# Number of reads where all Phred scores > 30
# qcount == 0 , meaning all quality scores per base in the read >= 30
fqsample_qfiltered = fq_sample[qcount == 0]

Calidad general de las lecturas : representación visual.

#fastq original reads (non-filtered): 
qaSummary = qa(fq_sample, lane = 1)
df = qaSummary[["readQualityScore"]]

#fastq filtered reads by quality  
qaSummary_q = qa(fqsample_qfiltered, lane = 1)
df_q = qaSummary_q[["readQualityScore"]]

# Plotting base quality 
par(mfrow=c(1,2))
ShortRead:::.plotReadQuality(df[df$type=="read",])

ShortRead:::.plotReadQuality(df_q[df$type=="read",])

(Lanes with consistently good quality reads have strong peaks at the right of the panel).

De nuevo, podemos escribir en formato fastq el objeto de clase ShortReadQ que contiene las lecturas filtradas con la calidad mínima seleccionada :

writeFastq(object = fqsample_qfiltered, 
           file = "sreads/results/reads_Qfiltered.fastq")

Calidad general de las lecturas : número de secuencias.

## N. total de lecturas en dataset original: 603.239
## N. total de lecturas en dataset filtrado por calidad: 159.777

Calidad general de las lecturas : estadística descriptiva.

# summaries of reads in non-filtered and filtered datasets 
summary((width(fq_sample)))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   35.00   76.00   76.00   75.69   76.00   76.00
summary((width(fqsample_qfiltered)))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   64.00   76.00   76.00   75.67   76.00   76.00
# histograms of reads in non-filtered and filtered datasets 
par(mfrow=c(1,2))
hist(width(fq_sample), main = "non-filtered reads", xlab = "length")
hist(width(fqsample_qfiltered), main = "filtered reads", xlab = "length")


Alineamiento

  *Analisis de genomas y transcriptomas con plataforma NGS
  *Master Biotecnología
  *2025
  
  Jose V. Die, Dept. Genetics, UCO
  jose.die@uco.es

En este módulo trabajaremos con lecturas de Saccharomyces cerevisiae generadas en un experimento RNA-Seq. Las lecturas obtenidas son del tipo paired-end reads donde cada fragmento está secuenciado por los dos extremos, generando una estructura de ficheros de este tipo:

  • FFFF_1.fastq.gz
  • FFFF_2.fastq.gz

Además, trabajaremos con un formato nuevo de fichero (.gff3) que contiene información de anotación genómica.

 

Formatos ficheros :

  • .FASTA (.FA,.FNA): biological sequences (eg. chromosomes)
  • .FASTQ (.FQ): short-reads (eg. illumina)
  • .GFF (.GFF3, .GTF): genomic elements coordinates (eg. genes)

 

Para completar este módulo se realizarán las siguientes tareas :

  • descarga anotación genoma S. cerevisiae (.gff3)
  • descarga de secuencias RNA-Seq (SRR9336468)
  • descompresión de ficheros (.gz)
  • alineamiento BLAST
  • genome mapping alignment
    1. index the reference genome
    2. align fastq files
    3. convert .sam files into .bam files
  • IGV : visual exploration of genomic data
  • filtering reads by mapping quality

Mapeador : Bowtie2

Visualización datos genómicos : IGV

 

Instalación del paquete que usaremos en este módulo.

BiocManager::install("Rbowtie2")        # <- align FASTQ files
BiocManager::install("Rsamtools")       # <- manipulate SAM files

Una vez instalados los paquetes necesarios para este módulo, cargamos las librerías.

# Dependencies
library(Rbowtie2)
library(Rsamtools)

Descarga ficheros

Estructura del proyecto

# check folders - if FALSE, create the folders
dir.exists("mapping") 
dir.exists("mapping/fastq")     
dir.exists("mapping/genome")
dir.exists("mapping/bam")
dir("mapping")                   # see files and folders in the indicated path)

Descarga ficheros secuencia genómica

# download if the file is not in the folder
if (!file.exists("mapping/genome/Saccharomyces_cerevisiae_genome.gff3")) {
download.file("https://www.dropbox.com/s/qtaret1hrbvw2xb/Saccharomyces_cerevisiae_genome.gff3.gz?dl=1",
              destfile = "mapping/genome/Saccharomyces_cerevisiae_genome.gff3.gz")
}

También necesitamos el fichero que contiene la secuencia del genoma de S. cerevisiae (fasta). Como lo descargamos en el módulo anterior, podemos copiarlo desde allí.

file.copy(from = "sreads/genome/Saccharomyces_cerevisiae_genome.fa", to = "mapping/genome/" )

Descarga de secuencias cortas

download.file("https://www.dropbox.com/s/v06um7vt9ojdf42/NGS_MB_SRR9336468_1.fastq.gz?dl=1",
              destfile = "mapping/fastq/1M_SRR9336468_1.fastq.gz")

download.file("https://www.dropbox.com/s/kgxfth8ra675ccu/NGS_MB_SRR9336468_2.fastq.gz?dl=1",
              destfile = "mapping/fastq/1M_SRR9336468_2.fastq.gz")

Descompresión de las secuencias fasta.gz y fastq.gz

# Uncompress fasta.gz & fastq.gz files
#R.utils::gunzip("mapping/genome/Saccharomyces_cerevisiae_genome.fa.gz", remove=TRUE)
R.utils::gunzip("mapping/genome/Saccharomyces_cerevisiae_genome.gff3.gz", remove=TRUE)
R.utils::gunzip("mapping/fastq/1M_SRR9336468_1.fastq.gz", remove=TRUE)
R.utils::gunzip("mapping/fastq/1M_SRR9336468_2.fastq.gz", remove=TRUE)

En una consola LINUX podemos observar el fichero de anotación gff3 y los ficheros fastq.

system("grep -v '#' genes/Saccharomyces_cerevisiae_genome.gff3")
# # Total number of reads ?
system("wc -l fastq/1M_SRR9336468_1.fastq")
# # Look at the two reads from the same cDNA fragment
system("head -n 4 fastq/1M_SRR9336468_1.fastq")

BLAST genome

A modo ilustrativo, vamos a alinear la primera secuencia de cada fichero contra la secuencia de referencia del genoma de Saccharomices.

Comenzaremos con la secuencia @SRR9336468.1 1/2 y analizaremos en el resultado del BLAST el sitio único del alineamiento o mapeo, las coordenadas del intervalo, la coordenada de la primera base de la secuencia y la orientación de la cadena.

Del mismo modo, seguiremos con la secuencia @SRR9336468.1 1/1.

 

Run blast genome : saccharomices

Genome mapping alignment

  1. Index the reference genome
# # use the indexer `bowtie2_build`
bowtie2_build(references = "mapping/genome/Saccharomyces_cerevisiae_genome.fa",
              bt2Index = "mapping/genome/Saccha_genome",
              overwrite = TRUE)
# # create `.fai` file with `indexFa` from `Rsamtools`
indexFa("mapping/genome/Saccharomyces_cerevisiae_genome.fa")
  1. Align .fastq files against the indexed genome
# # align fastq files against the genome with `bowtie2`
bowtie2(bt2Index = "mapping/genome/Saccha_genome",
        samOutput = "mapping/bam/SRR9336468.sam",
        seq1 = "mapping/fastq/1M_SRR9336468_1.fastq",
        seq2 = "mapping/fastq/1M_SRR9336468_2.fastq",
        overwrite = TRUE,
        "--threads=3")


# Time difference of 2.249662 mins
  1. Convert .sam files into bam files (more compact and indexed)
# convert SAM file into BAM file (and indexes .bai) with "asBam"
asBam("mapping/bam/SRR9336468.sam")

# Time difference of 1.345935 mins 

Una vez se tiene el .bam, el fichero .samse puede borrar ya que los ficheros .sam son los más prescindibles y los que más ocupan.

file.remove("mapping/bam/SRR9336468.sam")  

IGV

  • Load genome : Genome / Local File / chromosomes / .fa + .fai
  • Load genes : Tracks / Local File (gff3)
  • View options
  • Load reads : Tracks / Load File / select bam + index (bai)

Filtering reads by mapping quality

Read the .bam file

bamfile <- BamFile("mapping/bam/SRR9336468.bam")
bamfile
## class: BamFile 
## path: mapping/bam/SRR9336468.bam
## index: NA
## isOpen: FALSE 
## yieldSize: NA 
## obeyQname: FALSE 
## asMates: FALSE 
## qnamePrefixEnd: NA 
## qnameSuffixStart: NA

Podemos acceder a las funciones (methods) disponibles para un objeto de clase BamFile con la siguiente llamada:

# # Methods accessors 
methods(class = "BamFile")
##  [1] $                   $<-                 asMates            
##  [4] asMates<-           assayData<-         close              
##  [7] coerce              contents            countBam           
## [10] coverage            dbconn              dbfile             
## [13] ExpressionSet       filterBam           findSpliceOverlaps 
## [16] idxstatsBam         index               index<-            
## [19] indexBam            initialize          isIncomplete       
## [22] isOpen              obeyQname           obeyQname<-        
## [25] open                path                pileup             
## [28] qnamePrefixEnd      qnamePrefixEnd<-    qnameSuffixStart   
## [31] qnameSuffixStart<-  quickBamFlagSummary readGAlignmentPairs
## [34] readGAlignments     readGAlignmentsList readGappedReads    
## [37] scanBam             scanBamHeader       seqinfo            
## [40] show                sortBam             summarizeOverlaps  
## [43] testPairedEndBam    updateObject        yieldSize          
## [46] yieldSize<-        
## see '?methods' for accessing help and source code
seqinfo(bamfile)
## Seqinfo object with 17 sequences from an unspecified genome:
##   seqnames seqlengths isCircular genome
##   I            230218       <NA>   <NA>
##   II           813184       <NA>   <NA>
##   III          316620       <NA>   <NA>
##   IV          1531933       <NA>   <NA>
##   V            576874       <NA>   <NA>
##   ...             ...        ...    ...
##   XIII         924431       <NA>   <NA>
##   XIV          784333       <NA>   <NA>
##   XV          1091291       <NA>   <NA>
##   XVI          948066       <NA>   <NA>
##   Mito          85779       <NA>   <NA>

Set the filter condition

# # build the filtered : .bam + .bai files 
param = ScanBamParam(mapqFilter = 35)     # min quality = 35 * 
dest = filterBam(file = bamfile, destination = "mapping/bam/mapq.bam", param = param) 

# en el próximo módulo veremos que también podemos controlar la calidad del alineamiento `Deseq2` con la `featureCounts` y la variable 'minMQS'

Count number of alignments

bamq <- BamFile("mapping/bam/mapq.bam")
countBam(bamfile)
##   space start end width           file records nucleotides
## 1    NA    NA  NA    NA SRR9336468.bam 2000000       3e+08
countBam(bamq)
##   space start end width     file records nucleotides
## 1    NA    NA  NA    NA mapq.bam 1619149   242872350

Manipulating .bamfiles with scanBam

# # scanBam generates a list (length = 1) with 13 elements
aln <- scanBam(bamfile)
aln35 = scanBam(bamq)
# # sequence reads
aln[[1]]$seq[1:10]
aln35[[1]]$seq[1:10]
# # mapping quality reads
aln[[1]]$mapq[1:10]
aln35[[1]]$mapq[1:10]
# # descriptive statistics
summary(aln[[1]]$mapq)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00   42.00   42.00   37.37   42.00   42.00  104977
summary(aln35[[1]]$mapq)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   35.00   42.00   42.00   41.85   42.00   42.00
# # check number of reads based on a condition
sum(aln[[1]]$mapq < 35, na.rm = TRUE)
## [1] 275874
sum(aln35[[1]]$mapq < 35, na.rm = TRUE)
## [1] 0

DEA

  *Analisis de genomas y transcriptomas con plataforma NGS
  *Master Biotecnología
  *2025
  
  Jose V. Die, Dept. Genetics, UCO
  jose.die@uco.es

En este módulo trabajaremos con las lecturas de Saccharomyces cerevisiae generadas en el experimento RNA-Seq del módulo anterior. Las lecturas obtenidas son del tipo paired-end reads donde cada fragmento está secuenciado por los dos extremos. El trabajo original puede consultarse en PubMed.

La tabla de metadatos es la siguiente : Bioproject : PRJNA550078.
Publicacion : 2020-02-27.

SampleID Description Replicate Millions of Spots Biosample
SRR9336468 pH5 CO2 0.04% R1 23.8 SAMN12107229
SRR9336469 pH5 CO2 0.04% R2 21.9 SAMN12107228
SRR9336470 pH5 CO2 0.04% R3 21.6 SAMN12107227
SRR9336471 pH5 CO2 50% R1 23.1 SAMN12107226
SRR9336472 pH5 CO2 50% R2 23.4 SAMN12107225
SRR9336473 pH5 CO2 50% R3 20.5 SAMN12107224
SRR9336474 pH3 CO2 0.04% R1 23.0 SAMN12107223
SRR9336475 pH3 CO2 0.04% R2 20.9 SAMN12107248
SRR9336476 pH3 CO2 0.04% R3 21.4 SAMN12107244

La primera parte del módulo ya se ha realizado en la sección del alineamiento. Para completar este módulo se realizarán las siguientes tareas :

  • descarga anotación genoma S. cerevisiae (.gff3)
  • descarga de secuencias RNA-Seq (SRR9336468)
  • descompresión de ficheros (.gz)
  • genome mapping alignment
    1. index the reference genome
    2. align fastq files
    3. convert .sam files into .bam files
       
  • descarga .bam files
  • organize the data
  • DESeq2 object
  • normalize raw counts
  • clustering analyses
  • model fitting
  • model contrast
  • resuls table
  • visualizing results
  • DEG annotation

 

Papers de interés:

 

Instalación de los paquetes que usaremos en este módulo.

BiocManager::install("Rsubread")        # <- quantification of RNA-seq reads
BiocManager::install("DESeq2")          # <- method for DEA

Una vez instalados los paquetes necesarios para este módulo, cargamos las librerías.

# Dependencies
library(Rsamtools)
library(Rsubread)
library(DESeq2)

Descarga ficheros

Estructura del proyecto

# check folders - if FALSE, create the folders
dir.exists("dea")
dir.exists("dea/genome")
dir.exists("dea/fastq")
dir.exists("dea/bam")
dir("dea")                   # see files and folders in the indicated path)

Descarga ficheros secuencia genómica

Necesitamos los ficheros de la secuencia del genoma de S. cerevisiae (fasta) y el fichero de anotación (.gff3) que descargamos en el módulo anterior. Podemos copiarlos desde allí.

file.copy(from = "mapping/genome/Saccharomyces_cerevisiae_genome.fa", to = "dea/genome")
file.copy(from = "mapping/genome/Saccharomyces_cerevisiae_genome.fa.fai", to = "dea/genome")
file.copy(from = "mapping/genome/Saccharomyces_cerevisiae_genome.gff3", to = "dea/genome")
dir("dea/genome")
## [1] "Saccharomyces_cerevisiae_genome.fa"  
## [2] "Saccharomyces_cerevisiae_genome.gff3"

Descarga de secuencias cortas Las lecturas de secuencia corta que necesitamos son las que hemos descargado en el módulo anterior y podemos copiarlas desde allí.

file.copy(from = "mapping/fastq/1M_SRR9336468_1.fastq", to = "dea/fastq")
file.copy(from = "mapping/fastq/1M_SRR9336468_2.fastq", to = "dea/fastq")
dir("dea/fastq")
## [1] "1M_SRR9336468_1.fastq" "1M_SRR9336468_2.fastq"

Genome mapping alignment

Este paso reproduce el alineamiento realizado en el módulo anterior módulo anterior con las secuencias ‘1M_SRR9336468_1.fastq’ y ‘1M_SRR9336468_2.fastq’ :

  1. Index the reference genome
  2. Align .fastq files against the indexed genome
  3. Convert .sam file into .bam file

El fichero ‘SRR9336468.bam’ que genera este proceso podemos copiarlo desde el caso anterior. En esta ocasión, vamos a llamarle ‘basal_R1.bam’.

file.copy(from = "mapping/bam/SRR9336468.bam", to = "dea/bam/basal_R1.bam")
file.copy(from = "mapping/bam/SRR9336468.bam.bai", to = "dea/bam/basal_R1.bam.bai")

Download the rest of BAM files :

download.file("https://www.dropbox.com/s/iez6egp57hazmyc/bam_files.zip?dl=1", 
              destfile = "dea/bam/bam_files.zip")

#Time difference of 1.007498 mins

Uncompress bam_files.zip

unzip("dea/bam/bam_files.zip",  exdir = "dea/bam" )

The zip file = 872.219 MB. However the dowloaded file = 659.2 MB. When the downloading step is not completed, for some reason, unzip does not work. In order to avoid problems, download the .zip file and uncompress manually.

Gene quantification

Quantify “genes” with `featureCounts .

# store all paths of bam files into "bamFiles" R object (use `list.files`)
bamFiles <- list.files(path = "dea/bam", pattern = "*.bam$", full.names = TRUE)
bamFiles
## [1] "dea/bam/basal_R1.bam"   "dea/bam/basal_R2.bam"   "dea/bam/basal_R3.bam"  
## [4] "dea/bam/indust1_R1.bam" "dea/bam/indust1_R2.bam" "dea/bam/indust1_R3.bam"
## [7] "dea/bam/indust2_R1.bam" "dea/bam/indust2_R2.bam" "dea/bam/indust2_R3.bam"
# quantify 'genes'
data <- Rsubread::featureCounts(
  files           = bamFiles,
  annot.ext       = "dea/genome/Saccharomyces_cerevisiae_genome.gff3",
  isGTFAnnotation = TRUE,
  GTF.featureType = "gene",
  GTF.attrType    = "ID",
  isPairedEnd     = TRUE,
  requireBothEndsMapped = TRUE,
  minMQS          = 30,
  strandSpecific  = 2,
  nthreads = 4) 
## 
##         ==========     _____ _    _ ____  _____  ______          _____  
##         =====         / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
##           =====      | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
##             ====      \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
##               ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
##         ==========   |_____/ \____/|____/|_|  \_\______/_/    \_\_____/
##        Rsubread 2.4.3
## 
## //========================== featureCounts setting ===========================\\
## ||                                                                            ||
## ||             Input files : 9 BAM files                                      ||
## ||                                                                            ||
## ||                           basal_R1.bam                                     ||
## ||                           basal_R2.bam                                     ||
## ||                           basal_R3.bam                                     ||
## ||                           indust1_R1.bam                                   ||
## ||                           indust1_R2.bam                                   ||
## ||                           indust1_R3.bam                                   ||
## ||                           indust2_R1.bam                                   ||
## ||                           indust2_R2.bam                                   ||
## ||                           indust2_R3.bam                                   ||
## ||                                                                            ||
## ||              Paired-end : yes                                              ||
## ||        Count read pairs : yes                                              ||
## ||              Annotation : Saccharomyces_cerevisiae_genome.gff3 (GTF)       ||
## ||      Dir for temp files : .                                                ||
## ||                 Threads : 4                                                ||
## ||                   Level : meta-feature level                               ||
## ||      Multimapping reads : counted                                          ||
## || Multi-overlapping reads : not counted                                      ||
## ||   Min overlapping bases : 1                                                ||
## ||                                                                            ||
## \\============================================================================//
## 
## //================================= Running ==================================\\
## ||                                                                            ||
## || Load annotation file Saccharomyces_cerevisiae_genome.gff3 ...              ||
## ||    Features : 6600                                                         ||
## ||    Meta-features : 6600                                                    ||
## ||    Chromosomes/contigs : 17                                                ||
## ||                                                                            ||
## || Process BAM file basal_R1.bam...                                           ||
## ||    Strand specific : reversely stranded                                    ||
## ||    Paired-end reads are included.                                          ||
## ||    Total alignments : 1000000                                              ||
## ||    Successfully assigned alignments : 763960 (76.4%)                       ||
## ||    Running time : 0.01 minutes                                             ||
## ||                                                                            ||
## || Process BAM file basal_R2.bam...                                           ||
## ||    Strand specific : reversely stranded                                    ||
## ||    Paired-end reads are included.                                          ||
## ||    Total alignments : 1000000                                              ||
## ||    Successfully assigned alignments : 754176 (75.4%)                       ||
## ||    Running time : 0.01 minutes                                             ||
## ||                                                                            ||
## || Process BAM file basal_R3.bam...                                           ||
## ||    Strand specific : reversely stranded                                    ||
## ||    Paired-end reads are included.                                          ||
## ||    Total alignments : 1000000                                              ||
## ||    Successfully assigned alignments : 767728 (76.8%)                       ||
## ||    Running time : 0.01 minutes                                             ||
## ||                                                                            ||
## || Process BAM file indust1_R1.bam...                                         ||
## ||    Strand specific : reversely stranded                                    ||
## ||    Paired-end reads are included.                                          ||
## ||    Total alignments : 1000000                                              ||
## ||    Successfully assigned alignments : 770775 (77.1%)                       ||
## ||    Running time : 0.01 minutes                                             ||
## ||                                                                            ||
## || Process BAM file indust1_R2.bam...                                         ||
## ||    Strand specific : reversely stranded                                    ||
## ||    Paired-end reads are included.                                          ||
## ||    Total alignments : 1000000                                              ||
## ||    Successfully assigned alignments : 760137 (76.0%)                       ||
## ||    Running time : 0.01 minutes                                             ||
## ||                                                                            ||
## || Process BAM file indust1_R3.bam...                                         ||
## ||    Strand specific : reversely stranded                                    ||
## ||    Paired-end reads are included.                                          ||
## ||    Total alignments : 1000000                                              ||
## ||    Successfully assigned alignments : 756489 (75.6%)                       ||
## ||    Running time : 0.01 minutes                                             ||
## ||                                                                            ||
## || Process BAM file indust2_R1.bam...                                         ||
## ||    Strand specific : reversely stranded                                    ||
## ||    Paired-end reads are included.                                          ||
## ||    Total alignments : 1000000                                              ||
## ||    Successfully assigned alignments : 761090 (76.1%)                       ||
## ||    Running time : 0.01 minutes                                             ||
## ||                                                                            ||
## || Process BAM file indust2_R2.bam...                                         ||
## ||    Strand specific : reversely stranded                                    ||
## ||    Paired-end reads are included.                                          ||
## ||    Total alignments : 1000000                                              ||
## ||    Successfully assigned alignments : 760304 (76.0%)                       ||
## ||    Running time : 0.01 minutes                                             ||
## ||                                                                            ||
## || Process BAM file indust2_R3.bam...                                         ||
## ||    Strand specific : reversely stranded                                    ||
## ||    Paired-end reads are included.                                          ||
## ||    Total alignments : 1000000                                              ||
## ||    Successfully assigned alignments : 756364 (75.6%)                       ||
## ||    Running time : 0.01 minutes                                             ||
## ||                                                                            ||
## || Write the final count table.                                               ||
## || Write the read assignment summary.                                         ||
## ||                                                                            ||
## \\============================================================================//

El objeto data es una lista que contiene varios elementos. Las cuentas de lecturas se han almacenado en ‘counts’.

names(data)
## [1] "counts"     "annotation" "targets"    "stat"
# Extract read counts into a new object 
raw_counts = data$counts

# Example : read counts for a given gene over one condition (three bio. reps)
raw_counts["gene:YAL063C", 7:9]
## indust2_R1.bam indust2_R2.bam indust2_R3.bam 
##              0              0              0

Exercise : Does the mapping match what I see in IGV ?

DESeq workflow

Como punto de partida, DESeq requiere dos objetos con la siguiente información :

  • raw counts of the number of reads aligning each gene
  • associated sample metadata

Comenzaremos creando una tabla de metadatos donde se recoja la información del experimento. Un ejemplo puede verse aquí.

# Create sample vector
sampleID = paste0("SRR93364", 68:76)

# Create condition vector
conditions = rep(c("basal", "indust1", "indust2"), each = 3)

# Create description vector
description = rep(c("pH5_0.04CO2", "pH5_5CO2", "pH3_0.04CO2"), each = 3)

# Create data frame
saccha_metadata <- data.frame(sampleID, conditions, description)

# Assign the row names of the data frame
rownames(saccha_metadata) = paste(conditions, paste0("R",1:3), sep = "_")

# Save the df into a csv file 
dir.create("dea/data")
write.csv(saccha_metadata, file = "dea/data/saccha_metadata.csv")

DESeq2 requires the sample names in the metadata and counts dataset to be in the same order. Therefore, the row names in the metadata need to be in the same order as the column names of the count data (= mismo orden que los ficheros bam).

# column names of the count data 
colnames(raw_counts) = gsub(".bam", "",  colnames(raw_counts))

# row names of the metadata
rownames(saccha_metadata)
## [1] "basal_R1"   "basal_R2"   "basal_R3"   "indust1_R1" "indust1_R2"
## [6] "indust1_R3" "indust2_R1" "indust2_R2" "indust2_R3"
# check identity
identical(colnames(raw_counts), rownames(saccha_metadata))
## [1] TRUE
dds <- DESeqDataSetFromMatrix(countData = raw_counts, 
                              colData = saccha_metadata, 
                              design = ~ conditions)
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

Normalize raw counts for library size.
The raw counts for each sample are divided by the associated sample-specific size factor for normalization. To view the size factors used for normalization, we can use the sizeFactors function.

dds <- estimateSizeFactors(dds)
sizeFactors(dds)
##   basal_R1   basal_R2   basal_R3 indust1_R1 indust1_R2 indust1_R3 indust2_R1 
##  1.0575946  1.0486980  1.2369258  1.2574809  1.0282855  1.0269234  0.9417575 
## indust2_R2 indust2_R3 
##  0.8067719  0.8060861
norm_counts <- counts(dds, normalized = TRUE)
# Histogram
par(mfrow=c(1,2))
boxplot(log2(counts(dds, normalized = FALSE)+1), 
        cex = 0.5, cex.axis = .9, main = "non-normalized libraries")
boxplot(log2(norm_counts+1), cex = 0.5, cex.axis = .9, 
        main = "normalized libraries")

# Scatterplot : log normalized counts against each other
log.norm.counts = log2(norm_counts + 1)
plot(log.norm.counts[,1:2], cex=.1)

Unsupervised Clustering Analyses.

# Transform the normalized counts : VST-transformed normalized counts
vst_counts <- vst(dds, blind = TRUE)

# Extract the matrix of transformed counts
vsd <- assay(vst_counts)

# Compute the correlation values between samples
vsd_cor <- cor(vsd) 
# Hierarchical analysis 
BiocManager::install("pheatmap")
pheatmap::pheatmap(vsd_cor, annotation = dplyr::select(saccha_metadata, conditions))

# Plot PCA
plotPCA(vst_counts, intgroup = "conditions")

Model fitting.
Run the DESeq2 analysis : model fitting for gene expression data.

dds <- DESeq(dds)
# Plot dispersions
plotDispEsts(dds)

Model contrast.
Run the DESeq2 model contrast to extract the results.

P5C5vsP5C004 <- results(dds, 
                         contrast=c("conditions", "indust1", "basal"), 
                         alpha = 0.05)

plotMA(P5C5vsP5C004, colSig = "red")

Save results as a data frame.

write.csv(as.data.frame(P5C5vsP5C004), 
          file="dea/results/P5C5vsP5C004_DEG.csv")

Results table.

DESeq2 results table.

mcols(P5C5vsP5C004)
## DataFrame with 6 rows and 2 columns
##                        type            description
##                 <character>            <character>
## baseMean       intermediate mean of normalized c..
## log2FoldChange      results log2 fold change (ML..
## lfcSE               results standard error: cond..
## stat                results Wald statistic: cond..
## pvalue              results Wald test p-value: c..
## padj                results   BH adjusted p-values
summary(P5C5vsP5C004)
## 
## out of 6244 with nonzero total read count
## adjusted p-value < 0.05
## LFC > 0 (up)       : 37, 0.59%
## LFC < 0 (down)     : 42, 0.67%
## outliers [1]       : 0, 0%
## low counts [2]     : 966, 15%
## (mean count < 9)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
head(P5C5vsP5C004)
## log2 fold change (MLE): conditions indust1 vs basal 
## Wald test p-value: conditions indust1 vs basal 
## DataFrame with 6 rows and 6 columns
##                 baseMean log2FoldChange     lfcSE      stat    pvalue      padj
##                <numeric>      <numeric> <numeric> <numeric> <numeric> <numeric>
## gene:YAL069W           0             NA        NA        NA        NA        NA
## gene:YAL068W-A         0             NA        NA        NA        NA        NA
## gene:YAL068C           0             NA        NA        NA        NA        NA
## gene:YAL067W-A         0             NA        NA        NA        NA        NA
## gene:YAL067C           0             NA        NA        NA        NA        NA
## gene:YAL066W           0             NA        NA        NA        NA        NA

Run a model contrast setting a log2 threshold to extract genes with biological relevance.

P5C5vsP5C004 <- results(dds, 
                        contrast=c("conditions", "indust1", "basal"), 
                        alpha = 0.05, 
                        lfcThreshold = 0.32)     # 1.25-fold-change
## Número de genes diferencialmente expresados (P < 0.05) :  26
## Número de genes diferencialmente sobre-expresados (P < 0.05) :  7
## Número de genes diferencialmente inhibidos (P < 0.05) :  19

Resultados genes sobre-expresados :

subset(P5C5vsP5C004, log2FoldChange > 0 & padj < 0.05)
## log2 fold change (MLE): conditions indust1 vs basal 
## Wald test p-value: conditions indust1 vs basal 
## DataFrame with 7 rows and 6 columns
##               baseMean log2FoldChange     lfcSE      stat      pvalue
##              <numeric>      <numeric> <numeric> <numeric>   <numeric>
## gene:YER065C  219.8343        1.41929  0.228886   4.80281 1.56455e-06
## gene:YHR216W   31.6235        1.99321  0.348451   4.80186 1.57199e-06
## gene:YKL029C   77.5936        1.49898  0.301234   3.91383 9.08451e-05
## gene:YKR097W  116.2186        3.02692  0.292078   9.26781 1.90016e-20
## gene:YLR377C  119.0889        2.64950  0.302826   7.69254 1.44243e-14
## gene:YOL155C  692.0430        3.29114  0.588143   5.05173 4.37817e-07
## gene:YPL156C  165.8731        1.29763  0.246557   3.96515 7.33496e-05
##                     padj
##                <numeric>
## gene:YER065C 6.18391e-04
## gene:YHR216W 6.18391e-04
## gene:YKL029C 2.18168e-02
## gene:YKR097W 3.95486e-17
## gene:YLR377C 1.80131e-11
## gene:YOL155C 2.73373e-04
## gene:YPL156C 1.99128e-02

Resultados genes inhibidos :

subset(P5C5vsP5C004, log2FoldChange < 0 & padj < 0.05)
## log2 fold change (MLE): conditions indust1 vs basal 
## Wald test p-value: conditions indust1 vs basal 
## DataFrame with 19 rows and 6 columns
##               baseMean log2FoldChange     lfcSE      stat      pvalue
##              <numeric>      <numeric> <numeric> <numeric>   <numeric>
## gene:YBR068C   630.234       -1.98563  0.230943  -7.21232 5.50045e-13
## gene:YBR117C   194.545       -1.25507  0.236647  -3.95135 7.77120e-05
## gene:YDR461W   820.879       -1.97600  0.198064  -8.36096 6.22091e-17
## gene:YIL117C   114.665       -1.35399  0.263720  -3.92079 8.82597e-05
## gene:YIL015W   121.557       -1.56497  0.250900  -4.96202 6.97627e-07
## ...                ...            ...       ...       ...         ...
## gene:YNR030W  138.0639       -1.28392  0.240581  -4.00661 6.15957e-05
## gene:YNR044W   99.6903       -1.59479  0.265567  -4.80026 1.58460e-06
## gene:YPL223C  844.8084       -1.15058  0.198634  -4.18147 2.89624e-05
## gene:YPL187W  156.2926       -3.33751  0.736083  -4.09942 4.14182e-05
## gene:YPL058C 1130.3495       -4.14848  0.303836 -12.60049 2.09849e-36
##                     padj
##                <numeric>
## gene:YBR068C 5.72414e-10
## gene:YBR117C 2.02181e-02
## gene:YDR461W 9.71084e-14
## gene:YIL117C 2.18168e-02
## gene:YIL015W 3.95998e-04
## ...                  ...
## gene:YNR030W 1.74820e-02
## gene:YNR044W 6.18391e-04
## gene:YPL223C 9.51797e-03
## gene:YPL187W 1.23150e-02
## gene:YPL058C 6.55148e-33

Visualización de Resultados.
Visualización : MA plot

plotMA(P5C5vsP5C004, colSig = "red")

Visualización : heatmap

# Subset normalized counts to significant genes
degs <- rownames(subset(P5C5vsP5C004, padj < 0.05))
sig_norm_counts <- norm_counts[degs, ] # tabla final para heatmap


# Choose a color palette from RColorBrewer
library(RColorBrewer)
heat_colors <- brewer.pal(6, "YlOrRd")

pheatmap::pheatmap(sig_norm_counts, 
                   color = heat_colors, 
                   cluster_rows = TRUE, 
                   show_rownames = FALSE, 
                   annotation = dplyr::select(saccha_metadata, conditions), 
                   scale = "row")

DEGs annotation.
Download genome annotation in tabular format from NCBI.

# Download genome annotation 
download.file("https://www.dropbox.com/s/clp3eson00mlca3/sacCer_annotation.csv?dl=1", 
              destfile = "dea/data/saccha_annotation.csv")
# Read the annotation
annot <- read.csv("dea/data/saccha_annotation.csv")    

La tabla de DEGs la obtenemos del siguiente modo :

deg_res <- subset(P5C5vsP5C004, padj < 0.05)[degs, ]

El objeto deg_res es de tipo DESeq2 pero necesitamos convertirlo a data.frame para aplicar una función que definiremos a continuación.

class((deg_res))
## [1] "DESeqResults"
## attr(,"package")
## [1] "DESeq2"
deg_res = as.data.frame(deg_res)
class((deg_res))
## [1] "data.frame"

Call the function final_table and get a final table of DEGS with their annotation.

res_final <- final_table(deg_res)
res_final %>% arrange(desc(log2FoldChange))
## # A tibble: 26 × 16
##    gene    Protein.Name    baseMean log2FoldChange lfcSE  stat   pvalue     padj
##    <chr>   <chr>              <dbl>          <dbl> <dbl> <dbl>    <dbl>    <dbl>
##  1 YOL155C mannoprotein       692.            3.29 0.588  5.05 4.38e- 7 2.73e- 4
##  2 YKR097W phosphoenolpyr…    116.            3.03 0.292  9.27 1.90e-20 3.95e-17
##  3 YLR377C fructose 1,6-b…    119.            2.65 0.303  7.69 1.44e-14 1.80e-11
##  4 YHR216W IMP dehydrogen…     31.6           1.99 0.348  4.80 1.57e- 6 6.18e- 4
##  5 YKL029C malate dehydro…     77.6           1.50 0.301  3.91 9.08e- 5 2.18e- 2
##  6 YER065C isocitrate lya…    220.            1.42 0.229  4.80 1.56e- 6 6.18e- 4
##  7 YPL156C pheromone-regu…    166.            1.30 0.247  3.97 7.33e- 5 1.99e- 2
##  8 YPL223C Gre1p              845.           -1.15 0.199 -4.18 2.90e- 5 9.52e- 3
##  9 YBR117C transketolase …    195.           -1.26 0.237 -3.95 7.77e- 5 2.02e- 2
## 10 YNR030W dolichyl-P-Man…    138.           -1.28 0.241 -4.01 6.16e- 5 1.75e- 2
## # ℹ 16 more rows
## # ℹ 8 more variables: chr <chr>, Accession <chr>, Start <int>, Stop <int>,
## #   Strand <chr>, Locus <chr>, Protein.product <chr>, Length <int>